AITopics | tts technology

Collaborating Authors

tts technology

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimizing Multilingual Text-To-Speech with Accents & Emotions

Pawar, Pranav, Dwivedi, Akshansh, Boricha, Jenish, Gohil, Himanshu, Dubey, Aditya

arXiv.org Artificial IntelligenceJun-23-2025

State-of-the-art text-to-speech (TTS) systems realize high naturalness in monolingual environments, synthesizing speech with correct multilingual accents (especially for Indic languages) and context-relevant emotions still poses difficulty owing to cultural nuance discrepancies in current frameworks. This paper introduces a new TTS architecture integrating accent along with preserving transliteration with multi-scale emotion modelling, in particularly tuned for Hindi and Indian English accent. Our approach extends the Parler-TTS model by integrating A language-specific phoneme alignment hybrid encoder-decoder architecture, and culture-sensitive emotion embedding layers trained on native speaker corpora, as well as incorporating a dynamic accent code switching with residual vector quantization. Quantitative tests demonstrate 23.7% improvement in accent accuracy (Word Error Rate reduction from 15.4% to 11.8%) and 85.3% emotion recognition accuracy from native listeners, surpassing METTS and VECL-TTS baselines. The novelty of the system is that it can mix code in real time - generating statements such as "Namaste, let's talk about " with uninterrupted accent shifts while preserving emotional consistency. Subjective evaluation with 200 users reported a mean opinion score (MOS) of 4.2/5 for cultural correctness, much better than existing multilingual systems (p<0.01). This research makes cross-lingual synthesis more feasible by showcasing scalable accent-emotion disentanglement, with direct application in South Asian EdTech and accessibility software.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.1631

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)

Add feedback

A review-based study on different Text-to-Speech technologies

Chowdhury, Md. Jalal Uddin, Hussan, Ashab

arXiv.org Artificial IntelligenceDec-17-2023

This research paper presents a comprehensive review-based study on various Text-to-Speech (TTS) technologies. TTS technology is an important aspect of human-computer interaction, enabling machines to convert written text into audible speech. The paper examines the different TTS technologies available, including concatenative TTS, formant synthesis TTS, and statistical parametric TTS. The study focuses on comparing the advantages and limitations of these technologies in terms of their naturalness of voice, the level of complexity of the system, and their suitability for different applications. In addition, the paper explores the latest advancements in TTS technology, including neural TTS and hybrid TTS. The findings of this research will provide valuable insights for researchers, developers, and users who want to understand the different TTS technologies and their suitability for specific applications.

international conference, review-based study, speech, (16 more...)

arXiv.org Artificial Intelligence

2312.11563

Country: Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.97)
Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.70)

Add feedback

less-known-facts-about-ai-voices-and-text-to-speech

#artificialintelligenceJun-8-2022, 03:05:34 GMT

Voice artificial intelligence is an emerging technology that uses voice commands to interact with humans. The technology is witnessing tremendous growth and intense research in modern engineering to explore untapped areas. We are well accustomed to hearing AI voices narrating monotone articles and reports. One of the most trending examples of their use by many people is Alexa and Siri-enabled devices. These devices are getting significant recognition, and the market for similar products is growing exceptionally.

ai voice, text-to-speech technology, tts technology, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.54)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.54)
Information Technology > Artificial Intelligence > Assistive Technologies (0.54)

Add feedback

How innovations in voice have made it an end-to-end commerce channel

#artificialintelligenceMar-2-2021, 14:03:48 GMT

Text-to-speech (TTS) technology isn't exactly new – but the way it's shaping the future certainly is. From smart speakers to voice assistants, TTS is increasingly paramount in day-to-day interactions between brands and end users, leading to enhanced brand experiences and better business outcomes. Up until recently, TTS was confined to a specific use case: voice-enablement of written content to make computers'speak' to those with visual or reading impairments. TTS technology was based on utility and a need to make screen-related content accessible. As such, synthetic speech was traditionally digital-sounding and marred by poor audio quality and speaking style.

consumer, conversational ai, voice experience, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Speech (0.59)

Add feedback